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NONCONVEX SET INTERSECTION PROBLEMS: 

FROM PROJECTION METHODS TO THE NEWTON METHOD 
FOR SUPER-REGULAR SETS 


C.H. JEFFREY PANG 


Abstract. The problem of finding a point in the intersection of closed sets 
can be solved by the method of alternating projections and its variants. It was 
shown in earlier papers that for convex sets, the strategy of using quadratic 
programming (QP) to project onto the intersection of supporting halfspaces 
generated earlier by the projection process can lead to an algorithm that con¬ 
verges multiple-term superlinearly. The main contributions of this paper are 
to show that this strategy can be effective for super-regular sets, which are 
structured nonconvex sets introduced by Lewis, Luke and Malick. Manifolds 
should be approximated by hyperplanes rather than halfspaces. We prove the 
linear convergence of this strategy, followed by proving that superlinear and 
quadratic convergence can be obtained when the problem is similar to the 
setting of the Newton method. We also show an algorithm that converges at 
an arbitrarily fast linear rate if halfspaces from older iterations are used to 
construct the QP. 
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1. Introduction 

For finitely many closed sets K\,... ,K m in R", the Set Intersection Problem 
(SIP) is stated as: 

m 

(SIP): Find x G K := Q Ki , where K ^ 0. (1.1) 

2=1 
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One assumption on the sets Ad is that projecting a point in R ra onto each K, : is a 
relatively easy problem. 

A popular method of solving the SIP is the Method of Alternating Projections 
(MAP), where one iteratively projects a point through the sets Ad to find a point 
in K. For more on the background and recent developments of the MAP and its 
variants, we refer the reader to [BB961 lBR.0911ER TT) . as well as |Deu011 Chapter 9] 
and [BZ051 Subsubsection 4.5.4]. We refer to the references mentioned earlier for a 
commentary on the applications of the SIP for the convex case (i.e., when all the 
sets Ki in m are convex) 

1.1. The convex SIP. One problem of the MAP is slow convergence. As discussed 
in the previously mentioned references, in the presence of a regular intersection 
property, one can at best expect linear convergence of the MAP. A few acceleration 
methods were explored. The papers [CPR.67] 1CK891 IBDHPO.I] explored the accel¬ 
eration of the MAP using a line search in the case where Ki are linear subspaces. 
See also the papers [ HRERlll IPanl5aj for newer research for this particular setting. 

In jPan!5bj . we looked at a different method for the convex SIP (i.e., the SIP 
<o> when the sets Ad are all convex). Each projection generates a halfspace 
containing the intersection of the sets AT, and one can project onto the intersection 
of a number of these halfspaces using standard methods in quadratic programming 
(for example an active set method [0183] or an interior point method). We call 
this the SHQP (supporting halfspace and quadratic programming) strategy. This 
strategy is illustrated in Figure 11.11 We refer to |Panl5b] for more on the history 
on the SHQP strategy, and we point out a few earlier papers that had some ideas 
of the SHQP strategy [Pi^84l fCTP98l ICfPTm IBCK061IPM791IMPH81] . 



Figure 1.1. Refer to the diagram on the left. The method of 
alternating projections on two convex sets Ad and Ad in R 2 with 
starting iterate xq arrives at X 3 in three iterations. The point 
X 4 is the projection of x\ onto the intersection of halfspaces gen¬ 
erated by projecting onto Ad and Ad earlier. One can see that 
d(x 4 , K\ n Ad) < d(x 3 ,Ki D Ad), illustrating the potential of the 
SHQP (supporting halfspace and quadratic programming) strat¬ 
egy elaborated in [Panl5b| . The diagram on the right shows that 
such a heuristic need not be effective for nonconvex sets. 

The main result in [Pan 15b] is to show the following: For a convex SIP satisfying 
the linearly regular intersection property ('Definition ^. 51) . we have an algorithm that 
achieves multiple-term superlinear convergence if enough halfspaces generated from 
earlier projections are stored to form the quadratic programs to be solved in later 
iterations. While the proof of this result suggests keeping an impractically huge 
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number of halfspaces to guarantee the fast convergence, simple examples like the 
one in Figure 11.11 suggests that the number of halfspaces that need to be used to 
obtain the fast convergence can actually be quite small. 

1.2. The nonconvex SIP. We quote from jLLM09j on the applications and back¬ 
ground of the SIP in the nonconvex case (i.e., when the sets in (11.11) are not 
known to be convex): An example of a nonconvex set that is easy to project onto 
is the set of matrices with some fixed rank. The method of alternating projec¬ 
tions for nonconvex problems appear in areas such as inverse eigenvalue prob¬ 
lems [CC96I IChu95] , pole placement |OrsQ61 IYQ06] , information theory [TDHS05] , 
low-order control design |GB001 IOS961 IQHMflfij . and image processing |BCL02I 
IMTW141 IWA86j . Previous convergence results on nonconvex alternating projec¬ 
tion algorithms have been uncommon, and have either focused on a very special 
case (see, for example ( CC96l|LM08] b or have been much weaker than for the con¬ 
vex case [CT901 ITDHS05] . For more discussion, see |LM08] . More recent works on 
the nonconvex SIP include |BLPW13bl IBLPW13al IHL13] . See also [ABRSIO] . 

For the nonconvex problem, the projection onto a nonconvex set need not gener¬ 
ate a supporting halfspace. It is easy to construct examples such that the halfspace 
generated by the projection process will not contain any point in the intersection. 
(See for example the diagram on the right in Figure [LIP The notion of super¬ 
regularity (See Definition 12.21) was first defined in IhhMOQj . They also showed how 
super-regularity is connected to various other well-known properties in variational 
analysis. In the presence of super-regularity, they established the linear convergence 
of the MAP. 

1.3. Contributions of this paper. The main contribution of this paper is to 
make two observations about super-regular sets. The first observation is that once 
a point is close enough to a super-regular set, the projection onto this set produces 
a halfspace that locally separates a point from the set. (This observation is used 
to prove Claim (a) in Theorem 13.81 1 With this observation, the SHQP strategy 
can be carried over to super-regular sets. The second observation is that if one of 
the sets is a manifold, then we can use a hyperplane to approximate the manifold 
instead of using a halfspace in the QP subproblem and still obtain convergence of 
our algorithms. See (El- 

In Section [3l we show that under typical conditions in the study of alternating 
projections, an algorithm (Algorithm 13.11) that has a sequence of projection steps 
and SHQP steps that visits all the sets will converge linearly to a point in the 
intersection. In Section |4] we show that the SHQP strategy applied to find a point 
in the intersection of manifolds and super-regular sets with only one unit normal 
on its boundary points will converge superlinearly. The convergence is quadratic 
under added conditions. This makes a connection to the Newton method. Lastly, in 
Section [5j we show that arbitrary fast linear convergence is possible when enough 
halfspaces from previous iterations are kept to form the quadratic programs to 
accelerate later iterations. 

1.4. Notation. The notation we use are fairly standard. We let B(a;,r) be the 
closed ball with center x and radius r, and we denote the projection onto a set C 

by Pc(-)- 
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2. Preliminaries 

In this section, we recall some definitions in nonsmooth analysis and some basic 
background material on the theory of alternating projections that will be useful for 
the rest of the paper. 

Definition 2.1. (Normal cones and Clarke regularity) For a closed set C C R ra , 
the regular normal cone at x is defined as 

Nc(x) := {y | (y, x — x) < o(||a: — x||) for all x G C}. (2.1) 

The limiting normal cone at x is defined as 

Nc(x) := {y | there exists x% x, yi G Nc{xt) such that yi y}. (2.2) 

When Nc{x ) = Nc(x), then C is Clarke regular at x. If C is Clarke regular at all 
points, then we simply say that it is Clarke regular. 

An important tool for our analysis for the rest of the paper is the following notion 
of regularity of nonconvex sets. 

Definition 2.2. [LLM091 Proposition 4.4] (Super-regularity) A closed set C C M n 
is super-regular at a point x G C if, for all S > 0 we can find a neighborhood V of 
x such that 

(z — y, v) < 8\\z — t/IIH for all z, y G C D V and v G Nc{y )• 

We say that C is super-regular if it is super-regular at all points. 

The discussion in }LLM09j also shows that 

(1) Super-regularity at a point implies Clarke regularity there ILLM091 Corol¬ 
lary 4.5]. (The converse is not true [LLM091 Example 4.6].) 

(2) Either amenability at a point or prox-regularity at a point implies super¬ 
regularity there (LLM091 Propositions 4.8 and 4.9]. 

We assume that all the sets involved in this paper are super-regular. In view of 
property 0, we will not need to distinguish between Nc(x ) and Nc(x) for the rest 
of the paper. 

Remark 2.3. (On manifolds) It is clear that if M is a smooth manifold in the usual 
sense, then M is super-regular. Moreover, 

For all x G M, v G Nm{x) implies -i)G Nm{x). (2.3) 

For the rest of our discussions, we shall let a manifold be a super-regular set satis¬ 
fying (E3|). 

The following property relates d(x,n]^ 1 Ki) to maxi<;< m d(x,Ki). 

Definition 2.4. (Local metric inequality) We say that a collection of closed sets 
I\i C K™, l = 1 ,,m satisfies the local metric inequality at x if there is a /? > 0 
and a neighborhood V of x such that 

d(x, DjZ.rifj) < /? max d(x,Ki) for all x G V. (2.4) 

1 <l<m 

A concise summary of further studies on the local metric inequality appears 
in |Kru06| , who in turn referred to [BBL991 HofOOL INT011INY04] on the topic of 
local metric inequality and their connection to metric regularity. Definition 12.41 is 
sufficient for our purposes. The local metric inequality is useful for proving the 
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linear convergence of alternating projection algorithms lBB93t 1LLM09] . See }BB96] 
for a survey. 

Definition 2.5. (Linearly regular intersection) For closed sets J\) C R n , we say 
that {Ki}i has linearly regular intersection at x G K := Dif the following 
condition holds: 

m 

If vi = 0 for some ly G N Kl (x ), then v'i = 0 for all Z G {1,..., r}. (2.5) 

1=1 

The linearly regular intersection property appears in [KW981 Theorem 6.42] as a 
condition for proving that A r n ;y ] k, (x) = YiiiLi As discussed in |Kru06] 

and related papers, linearly regular intersection is related to the sensitivity analysis 
of the SIP (11.11) . Linearly regular intersection implies the linear convergence of 
the method of alternating projections. Furthermore, linearly regular intersection 
implies local metric inequality, but the converse is not true. 

The following easy and well known principle is used to prove the Fejer mono¬ 
tonicity of iterates in Theorems 15.21 and 15.31 

Proposition 2.6. (Fejer monotonicity) Suppose C is a closed convex set in 1", 
with x C and y G C . Then for any A G [0,1], 

lb - \ p c{x) + A (P c (x) - a:)]|| < |b - ar||, 
and the inequality is strict if A G [0,1). 

3. Basic local convergence for super-regular SIP 

In the absence of additional information on the global structure of a nonconvex 
SIP, the analysis of convergence must necessarily be local. In this section, we discuss 
how super-regularity can give a halfspace that locally separates a point from the 
intersection of the sets. This leads to the local linear convergence of an alternating 
projection algorithm that incorporates QP steps whenever possible. 

We begin with the algorithm that we study for this section. 

Algorithm 3.1. (Basic algorithm) Let I\i be (not necessarily convex) closed sets 
in R n for l G {1,..., m}. From a starting point xq G R™, this algorithm finds a 
point in the intersection K := 


01 For iteration i = 0,1,... 


02 

Set x® = Xi. 

03 

Find sets Si, 

04 

For j = 1,... 

05 

Find Xi^j^i 

06 

For l G Sj, 


:= “j 

01 

Define the 

08 

si c {l,.. 




{a; : {x{ 1 - x iijt i,x - x itji i) = 0} 
{x : (a^ _1 - x i}jt i,x- Xijj) < 0} 


by 

if Ki is a manifold 
otherwise. 


Sf : = {(k,l) : l G <Sfc, k G {1,... ,j}, and 

(ki,l), (& 2 , l) G S( implies k\ = fo}. 


(3.1a) 

(3.1b) 
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09 Set xj = P F j (x{ 1 ). 

10 end for 

11 Set Xi +1 = X™. 

12 end 


We allow some of the Sj’s to be empty as long as the condition U’TjS'j = 
is satisfied. When Sj = {j} and Sj = {(j,j)} for all i,j, Algorithm 
13.11 reduces to the alternating projection algorithm. Algorithm 13.11 has the given 
design because we believe that by performing QP steps with polyhedra that bound 
the sets Ki better, the convergence to a point in K can be accelerated. Yet, we 
still retain the flexibility of the size of the QPs so that each step can be performed 
with a reasonable amount of effort. 


Remark 3.2. (Mass projection) Another particular case of Algorithm 13.11 we will 
study in Section H] is when Si = {1,..., m}, Sj = 0 for all j £ {2,..., m}, and 
s i = {j} x S j for a11 hJ e {1,... ,to}. In such a case, Algorithm 13.II is simplified 
to 


%i,l ,1 

e PK,(xi) 


I \x : (xi 

_ 1 L \ 1 

If / 

({£ : {Xi 

%i -\-1 

= p n 


- %i,i,U x ~ Xi,i,i) = 0} 

- x it i,i,x- Xi t i t i) < 0} 


if Ki is a manifold 
otherwise 


Remark 3.3. (On the polyhedron F-) The polyhedron F- is defined by intersecting 
some of the halfsnaces/ hvoemlanes H, 1 . 1 . The line (13.11)1) in m defining S( 
ensures that no two of the halfspaces/ hyperplanes Hi^,i that are intersected to 
form F( come from projecting onto the same set. To see why we need (I3.1bl) . observe 
that we can draw two tangent lines to a manifold in R 2 that do not intersect, which 
would lead to Ff = 0. 


Remark 3.4. (Treatment of manifolds) Another feature of this algorithm is that 
when I\[ is a manifold, the set H, jj is a hyperplane instead. Manifolds are super¬ 
regular sets. We take advantage of property m of manifolds to create a more 
logical algorithm. The hyperplane is a better approximate of a manifold than a 
halfspace, and we may expect faster convergence to a point in K when we use 
lryperplanes instead. Another advantage of using hyperplanes is that quadratic 
programming algorithms resolve equality constraints (which are always tight) better 
than they resolve inequality constraints (where determining whether each constraint 
is tight at the optimal solution requires some effort). 


The lemma below will be useful in studying the convergence of the algorithms 
throughout this paper. 

Lemma 3.5. (Linear convergence conditions) Let K be a set in R". Suppose an 
algorithm generates iterates {xi} such that 

(1) There exists some p £ (0,1) such that d(xi+i, K) < pd{xi , K), and 

(2) there exists a constant c > 0 such that ||;Ej+i — a:®[| < cd(xi 1 1\). 

Then the sequence {xi} converges to a point x £ K, and we have, for all i > 0, 

(a) 11 Xi x11 < j^d(xi,K) < f^d(x 0 ,K), and 

(b) M(x i+ i, j^d(x i+ i, K)) CB (xi,j^d(xi,K)). 
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Proof. For any j > 0, we have 

\\xi+j+i - x i+j || < cd(x i+j ,K) < cp i d{x i ,K). 


Standard arguments in analysis shows that {a:*} is a Cauchy sequence which con¬ 
verges to a point x £ K. Both parts (a) and (b) are straightforward. □ 


The next result shows how such derived halfspaces relate to the original halfs¬ 
paces. 


Lemma 3.6. (Derived supporting halfspaces) Let x £ R™, and suppose Hi, H 2 , 

..., Hk are k halfspaces containing x such that d(x, dHi), the distance from x to 
the boundary of each half space Hi, is at most a. Suppose the normal vectors of 
each half space Hi is Vi, where ||f,|| = 1, and the constant 77 defined by 


77 := min 


k 

a jVj 

i=1 


: 


1 , A i > 0 for all i £ {1 ,..., k} 


(3.2) 


is positive, (i.e., 77 ^ 0.) Let F be the intersection of these halfspaces. Let H be 
the half space containing F produced by projecting from a point x' (f F onto F. In 
other words, the halfspace H is defined by 


{a: : ( x' — Pp(x'), x — Pf{x')) < 0}. 

Then the distance of x from the boundary of H is at most ^a. 

As a consequence, suppose Hi are defined by Hi = {x : (vi, x) < a}. Let v = 
A iVi 

v^ir 1 for some nonzero vector A £ R fc that has nonnegative components, and 

II 

H be H = {x : (v,x) < Then we have r\ l ) =1 H, C H C H. 


Proof. We remark that 77 is the distance of the origin to the convex hull of {vi}i =1 . 
We can eliminate halfspaces if necessary and assume that k > 1, and that Pp(x') lies 
on the boundaries of all the halfspaces. The KKT condition tells us that x' — Pf{x') 
lies in the conical hull of {uj}* =1 . By Caratheodory’s theorem, we can assume that 
k is not more than the dimension n. We can also eliminate halfspaces if necessary 
so that the vectors {fi }^ =1 are linearly independent. 

Suppose each halfspace Hi is defined by {x : ( Vi,x ) < bi}, where bi £ R. Since 
Pf( x') lies on the boundaries of the halfspaces Hi, we have 

(' Vi,Pp{x')) = bi for all i. (3.3) 

Define the hyperslab Si by 

Si := {x : {vi,x} £ [bi - a,bt]}. (3.4) 

Since the distance from x to the boundaries of each halfspace Hi were assumed to 
be at most a, the point x is inside all the hyperslabs Si. 

Let v be the vector • We now study the problem 

min x ( v,x} (3-5) 

s.t. x £ Si for all* £ k}. 


If the above problem were a maximization problem instead, then an optimizer is 
Pf{x'). Consider the point Pf{x') — ad, where d is the direction defined through 

(vi,d) = 1 for all i, and d £ spand^}*^). (3.6) 
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Since the vectors are linearly independent, such a d exists, and can be calcu¬ 

lated by d = QR~ J 1, where 1 is the vector of all ones, QR is the QR factorization 
of V, and V is the matrix formed by concatenating the vectors {iWe can 
use m and m to calculate that 

( Vi , Pf{x') — ad) = bi — a for all i, 


so Pf{x') — ad is on the other boundary of all the hyperslabs Si. Furthermore, since 
Nn’?_ 1 S i {P f(x') — ad) = —N n k_^ s fiPp{x')), we have —v G N n k_^ s .(PF(x') — ad). 
Hence Pp(x') — ad is a minimizer of (13.51) . 

We proceed to find the optimal value of (13.51) . Since v lies in the conical hull 
of {vi}\ =1 , v can be written as ||^x||, where A G R+ is a vector with nonnegative 
elements such that its elements sum to one. We can calculate 


( VA 


VIIWAII 


d = 


1 


X t V t QR-' i 1 


—T i 


l|WA|| 

1 

WM\ 


\ t R t Q t QR~ t 1 = 


ll™ll 


A r l = 


l|WA|| ‘ 


By the definition of 77 , we have |j^ 7 j||- > A. This means that the minimum value 
of (13.51) is at least (v,Pf(x') — ad) = (v,Pf{x')) — A a. Since x G Si for all i G 
{ 1 ,..., fc}, we can deduce that x lies in the hyperslab 


{a; : (v,x) G [(v,P F {x')) -«/r h (v,P F {x'))]}. 


In other words, x lies in the halfspace {x : (v,x) < (v,Pf(x'))}, and the distance 
from x to the boundary of this halfspace is at most ^a, which is the conclusion we 
seek. 

The final paragraph is easily deduced from the main result. □ 


Remark 3.7. (The formula rf) We remark that the use of the notation 77 in Lemma f3.6l 
is consistent with the notation of [Kru06j and related papers, where the relationship 
of the constants related to the sensitivity analysis of the SIP ( 11 . 11 ) and linearly 
regular intersection are studied. 

We now prove our result on the convergence of Algorithm 13.11 


Theorem 3.8. (Local linear convergence of general Algorithm) Suppose Ki, where 
l G { 1 , ..., m}, are super-regular at x* G K = C\yL x Ki. Suppose that rj defined by 


77 := min 


E 

i=l 


Vi 


■ Vi G N Ki (x*), x* G Ki, ^2 IMI = 1 


*=1 


is positive, (i.e., 77 ^ 0.) This is equivalent to {Ki}yf_ 1 having linear regular inter¬ 
section at x*, which in turn implies that the local metric inequality holds at x*. If Xq 
is sufficiently close to x*, then Alaorithm \3. 1\ converges to a point in K Q-linearly 
(i.e., at a rate bounded above by a geometric sequence). 


Proof. Since the local metric inequality holds at x*, let > 1 and V be a neigh¬ 
borhood of x* such that 


d{x,K) < fimaxd(x,Ki) for all x G V. 
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Let 


P = 


1 


+ 


1 


1 


+ 


1 


1 


1 


/3 2 to 3 4/3 4 to 6 /3 2 m 2 2 (3 3 m 5 16/3 4 m 8 16/? 4 m' 




and c = \/mi/ 1 + 


1 


1 


(3.7b) 


4m 3 /? 2 J 16m 6 /3 4 

It is clear to see that if m > 2, then p < 1. Choose 8 > 0 such that <5 < iQ~ 4 a 2 c - 
Since x* is super-regular at all sets Ki, where l £ {1,... , to}, we can shrink the 
neighborhood V if necessary so that for all l £ {1,..., to}, we have 

(v , z — y) < <5||v|| ||c — y\\ for all z,y £ I\iC\V and v £ Nk l (y). 

By the outer semicontinuity of the normal cones, we can shrink V if necessary so 
that for all x £V, we have 


i=l 


■■ Vi £ N Ki (x), x £ Ki, ^ || Vi || = 1 > > 


i=1 


Suppose Xo is close enough to x* such that B(a?o, jz^d(xo, K)) C V. Provided 
that we prove conditions (1) and (2) in Lemma 13.51 we have the convergence of the 
iterates {xi} to some point x £ K. The convergence of {x,i} to x would be at the 
rate suggested in Lemma ETST ah 

If x £ I\ nB(ij, j^—d(xi,K)) and a £ B (xi, j^—d(xi,K)), then 


-p 

J-i 




IK 1 ~ x i,iA 


,x — a) < 8\\x-Xij t i\ 


(3.8) 


< 

< 


8 ~ jC d(xi,K) 
1 ~ P 
1 


Define the halfspace H^- ( by 




7 — 1 

x i ~ X *JJ 

hr 1 ~ x i,j,i\ 


8m 4 /3 2 


, x - X it j,l ) < 


d{xi, I<). 


8to 4 /3 : 


r d(xi,K) 


(Note that the halfspace Hijj defined in Algorithm 13.11 is similar to Hf- ( with 
the exception that the right hand side of the inequality is zero.) We have Ki f~l 
B(iCi, jzr^d{xi,K)) C Hf- r Note that x{ is the projection of x {~ 1 onto F/. Define 


the halfspace L/W by 




7 — 1 7 

■' / _ l 

11 x^ ~ 1 - x 3 A 


-,x — xj ) < 


4m 4 /3 : 


;d(Xi,K) 


By Lemma 13.61 we have 


K 


HB(.Ti, i _ d(xi,K )) C l 


c H 




(3.9) 


(3.10) 


Note that almost exactly the same arguments works if the set Ki is a manifold, but 
we may have to take — —-L- as the normal vector of Hf-, instead and define 

Wxt —XiW 

H+j differently, depending on the multipliers in the KKT condition. 
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Claim: 

(a) If \\xj - xj^W > 5 ^pd{xi,K), then 

d(xi,K) 2 < d(oc 3 i ~ 1 , K) 2 — Wxl - 4 ^zd{xi,K) 

r i 2 

+ 4mipi d{Xi, K) 

(b) If IK-a^ _1 || < 2 ^ 32 d(x i} K), then d(xj , K) < d{xP i ~ 1 ,K) + ^^d{xi,K). 

Part (b) is obvious. We now prove part (a). Let y be any point in ), 

and let z = P H + (xj~ '). See Figure IXT1 where d 2 = 4^32 d(xi, K) in view of (13.91) . 

Noting that Zzyxj > 7t/2, we apply cosine rule to get 
d(xj,K) 2 

< lb-®<ll 2 

< || y -*|| 2 + ||*-4|| 2 


< Wy-xt'r-Wrt-zf + Wz-xlW 2 

1 


= d(x{-\K) 2 - 


1 xi—xt l ii- 


4 m 4 /9 ; 


d(xi,I\) 


4m 4 /3 : 


;d(xi,K) 


This completes the proof of the claim. 



Figure 3.1. This figure illustrates the proof in the claim of The¬ 
orem [XU Note that d\ = \\x{ — xj~ || — 4 ^ 32 d(xi,K) and d 2 = 

4m 4 /I 2 d( x i , K). 


It now remains the prove conditions (1) and (2) of Lemma 13.51 By local metric 
inequality, there is some j £ {l,...,m} such that d(xi,I\j) > jd(xi,K). Hence 
there is a distance ||a^' — a^ _1 || that will be at least ^ d(xi,K ). Making use of the 
claim earlier, we have the following estimate of d(xi+\, K ). 


< 


d{x i+1 ,K) 2 
d(xi,K) + 


(3.11) 


1 2 


2to 4 /3 : 


;d(Xi,K) 


^ d{x " K) -i^p d[xi ' K) 


1 2 


1 2 


1 + 


4 m 4 /? 2 
1 


d{xi, K) 

1 


1 


1 


1 


f3 2 m 3 4/3 4 m 6 fd 2 m 2 2/3 3 m 5 16/3 4 m 8 16/3 4 m 6 

= p 2 d(xi, K) 2 . 


d(xi,K ) 2 
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This proves that d(xi+i,I\) < pd{xi,K). Next, 
ll-Ci+l - Xi || 


(3.12) 


xi 


i=1 

^ m 

- Am3R2 d ^ Xi ^ K ) + 5Z 


J=1 


max ^ \\x\ — xj 1 || - „_ 4a2 d(xi, K),0 


4m, 4 /? 2 


< 


1 


4 m 3 p‘- 


:d(Xi,K) + 


\ 


■E 1 

j=i 


to > max ^ ||a,’l — xi 1 || — : — Tmd{xi,K), 0 


4 TO 4 /? 2 


(*) 

By the analysis in (13.lip , the fact that d(xi+±, K) 2 > 0 gives 
0 < d(x i+1 , K) 2 


< 


1 2 


d (*<> K ) + W 2to 4 /3 2 


T d(xi, K) 


' 4to 4 /? 2y " I"'*" 1 11 4 to 4 /3 2 

We thus deduce that the term marked (*) in (13.121) is at most 


1 


max •; \\x{ - x{ 1 || - A ^C da2 d(xi,K),0 


1 + 


1 


1 


rd(Xi,K). 


2m 3 /3 2 \ 16 to 6 /3 4 

Thus the constant c in Lemma ETT51 can be taken to be what was given in (I3.7bl) . □ 


Remark 3.9. (On the condition rj > 0 in Theorem 13.81) The condition rj > 0 is 
required in the proof of Theorem 13.81 only when | Si | > 1, when halfspaces are 
aggregated. So in the case of alternating projections, the weaker condition of local 
metric inequality is sufficient. 


4. Connections with the Newton method 

To find a point in {x £ K." : F(x) = 0} for some smooth F : K” — > R m , the 
method of choice is to use the Newton method provided that the linear system in the 
Newton method can be solved quickly enough. Note that the set {x : F(x) = 0} 
can be written as the intersection of the manifolds Mj := {x : Fj(x) = 0} for 
j £ {1,..., to}, where Fj : R n — > R. is the jth component of F(-). Note that the 
manifolds Mj are of codimension 1. This section gives conditions for which the 
SHQP strategy can converge superlinearly or quadratically when the sets involved 
satisfy the conditions for fast convergence in the Newton method. 

The following result was proved in [Panlhb] for convex sets, but is readily gen¬ 
eralized to Clarke regular sets, which we do so now. 

Theorem 4.1. (Supporting hyperplane near a point) Suppose C C R n is Clarke 
regular, and let x £ C. Then for any e > 0, there is a 5 > 0 such that for any point 
x £ [6^(5) n(7]\{i} and supporting hyperplane A of C with unit normal v £ Nq(x ) 
at the point x, we have 


(v, x — x) < e||.T — x||. 


(4.1) 
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Proof. Let S be small enough so that for any x £ [IBs(a:) D C]\{a;} and unit normal 
v £ Nc{x), we can find v £ Nc{x) such that ||u — u|| < § and that (v,x — x) < 
^\\x — x\\. Then we have 

(v, x — x) = (v — v, x — x) + (v, x — x) 

< ||n — H||||a; — x|| + -\\x — x|| 

< 4 x ~x\\- 

Thus we are done. □ 

We identify a property that will give multiple-term quadratic convergence. Com¬ 
pare this property to that in Theorem 14. II 

Definition 4.2. (Second order supporting hyperplane property) Suppose C C R n 
is a closed convex set, and let x £ C. We say that C has the second order supporting 
hyperplane (SOSH) property at x (or more simply, C is SOSH at x) if there are 
<5 > 0 and M > 0 such that for any point x £ [Bs(i) fl C']\{i} and v £ Nq{x) such 
that |M| = 1 , we have 

(v, x — x) < M\\x — x\\ 2 . (4.2) 

It is clear how m compares with (14.21) . The next two results show that SOSH 
is prevalent in applications. 

Proposition 4.3. (Smoothness implies SOSH) Suppose function f : KC —»• M is C 2 
at x. Then the set C = {x | f(x) < 0} is SOSH at x. 

Proof. Consider x, x £ C. In order for the problem to be meaningful, we shall only 
consider the case where f(x) = 0. We also assume that f{x) = 0 so that C has 
a tangent hyperplane at x. An easy calculation gives Nc{x ) = R+{V/(i)} and 
N c (x)=R+{Vf(x)}. 

Without loss of generality, let x = 0. We have 

f(x) = /(0) + V/( 0)x + ix T V 2 /( 0)x + o(||x|| 2 ). 

=>• /(0)a:+ ^a; T V 2 /(0)a; = o(||x|| 2 ). 

Since f(x) = /(0) = 0 and [V/(0) — V f(x)]x = x T \7 2 f(0)x + o(||x|| 2 ), we have 

-S7f(x)(x) = [V/(0) - Vf(x)]x + ^a; T V 2 /(0)x + o(||t|| 2 ) = 0(||x|| 2 ). 

Therefore, we are done. □ 

Proposition 4.4. (SOSH under intersection) Suppose Ki C R" are closed sets 
that are SOSH at x for l £ {1,..., m}. Let K := fl and suppose that {Ki}](L 1 
satisfy the linear regular intersection property at x. Then K is SOSH at x. 

Proof. Since each Ki is SOSH at x. we can find S > 0 and M > 0 such that for all 
l £ {1 ,..., m} and x £ Ki fl Bj(S) and v £ Nk, (x), we have 

(v,x — x) < M||?;||||a; — x\\ 2 . 

Claim 1: We can reduce <5 > 0 if necessary so that 

m 

vi = 0, vi £ Nki {x) and x £ K fl Bj(a;) 

i=i 

implies vi = 0 for all Z £ {!,..., m}. 


(4.3) 



NONCONVEX FEASIBILITY: PROJECTIONS, NEWTON METHOD 


13 


Suppose otherwise. Then we can find {xi} c j*i 1 £ K such that lini 24 = x and for 
all i > 0 , there exists vi t i £ Nki{xi) such that Y^iL\ * * * * v vi * * * x l,i = 0 but not all vij = 0 . 
We can normalize so that ||u;,,|| < 1, and for each i, max; ||i;j,j|| = 1. By taking a 
subsequence if necessary, we can assume that lim vn, say vi, exists for all l. Not 
all vi can be zero, but i W = 0. The outer semicontinuity of the normal cone 
mapping implies that vi £ Aqq (x). This contradicts the linear regular intersection 
property, which ends the proof of Claim 1. 

Claim 2: There exists a constant M' such that whenever x £ Bg(x) D K , 

Vi £ Nk,(x) and v = X"]™-, Vi, then max||u;|| < M'||u||. 

Suppose otherwise. Then for each i, there exists Xi £ B g(x) D K and vij £ 

Nki{x%) such that Vi = X^lLi^Mj 11*4 II — an d max z ||Wi|| = 1 for all *■ As we 
take limits to infinity, this would imply that (14.31) is violated, a contradiction. This 
ends the proof of Claim 2. 

Since (ECU) is satisfied, this means that Nk{x) = XXlILi (x) for all x £ B^j^fl 
K by the intersection rule for normal cones in [RW981 Theorem 6.42], Then each 

v £ Nk{x) can be written as a sum of elements in Nk,{x), say v = XZ™i v h where 

vi £ N Ki (x ), and max||u;|| < AT'11v11- Then 

m 

(v,x — x) = y ^(vi,X — x) 

1=1 

m 

< M\\x — x\\ 2 ^ ||uz|| < M\\x — a:|| 2 mM , ||t;||. 

z=i 

Thus we are done. □ 

We now make a connection to the Newton method. Consider the mass projection 
algorithm. 

Theorem 4.5. (Connection to Newton method) Consider Algorithm \3.1\ for the 
case when S± = {1,..., m} and Sj = 0 for all j £ {2,..., m } at all iterations i, 
and S) = {j} x Sj for all j £ {1,..., m}. See Remark \3.2[ Let x* £ K := n|^. 1 AT;. 
Suppose the following hold 

(1) Each set Ki is super-regular. 

(2) For each l £ {l,...,m} ; Ki is either a manifold , or Nk l {x) contains at 
most one point of norm 1 for all x £ Ki near x*. 

(3) The sets {Ki}ff 1 has linearly regular intersection at x*. 

Then provided Xq is close enough to x*, the convergence of the iterates {xi} to some 

x £ K is superlinear. Furthermore, the convergence is quadratic if all the sets Ki 
satisfy the SOSH property. 

Proof. By Theorem 13.81 the convergence of the iterates {xi} to x is assured. What 
remains is to prove that the convergence is actually superlinear, or quadratic under 
the additional assumption. Without loss of generality, let x = 0. We first prove 
the superlinear convergence. The proof in Theorem 13.81 assures that there is some 
/3 > 1 such that d(xi,K) < jd max; d(xi,Ki) for all iterates Xi. 

Let Xi be an iterate. Recall that 24,10 = -Pic, ( 24 ). The projection of 24 onto the 
polyhedron gives 24 + 1 . Let Vj~ be the unit normal in Nkj ( 24 + 1 ,iy) in the direction 
of 24+1 — 24 + 1,1 j, and let v° be the unit normal in -/Vjq,. ( 24 ,ij) that is close to v)j~. 
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The proof of Theorem 13.81 uses Lemma 13.51 Hence there are constants c and 
p £ (0,1) such that ||ari|| < jzr^d(xi, K ) for all i. By local metric inequality, let the 

index j be such that d(xi + i,K) < f3d(xi+i, Kj). We let k = ■ Then 

{Vj~,x i+ 1 - Xi+i t ij) = ||*i+i - Xj+ipjH (4.4) 

= d(x i+1 ,Kj ) > ^-d(x i+1 ,K) > -||x i+ i||. 

fj K 

Consider the neighborhood U such that if x £ U and v £ IV/v,. (x)\{0}, then 
llpil - pj-1| < ^ for some v £ N Kj (x)\{ 0}. If i is large enough, then Xi £ U 
and Xi } ij £ U for all j £ {1,..., m}, which leads to 

IK - <ll < K - «ill + ll< - «ill < (4.5) 

where Vj is the appropriate unit vector in Nfc j (x). For any 6 > 0, we can reduce 
the neighborhood U if necessary so that by super-regularity, 

(<, 0 — Xi + ipj) < <5||x i+ i,i j ||. (4.6) 

Claim: ||x*+i,i dll < V K jj ll a 'j+i II ■ 

We know that Xj+i = Xj+ipj +t<, where t = ||xj+i — Xj+ipj || > 0. By super¬ 
regularity, we have cos -1 S < Zx,.+i 24 +i,i,j 0 . Note that x/1 — <5 2 = sin cos -1 S. 
Some simple trigonometry ends the proof of the claim. 

Choose S small enough so that < j-. From (HU), we have 

S 1 

(<,0 — Xi+i,i,j) < (Sllxi+i.ull < ^==K+i|| < — ||x i+ i||. (4.7) 

Then combining (14.4[) . (14.71) and (14.51) . we get 

(u°,x i+ i) = (<,x i+ i - x i+1> ij) + (<,x i+1> ij) + (v° - <,x i+1 ) (4.8) 

> ^Ikmll - ^K+ill - ^ll* i+ i|| = ^K +1 |l- 

Choose any e > 0. Theorem 14.11 implies that (v°, x 7 ;.i. 7 ) < eHx^ijH for all i large 
enough. We have the following set of inequalities. 

{Vj,x i+ 1 ) < (v°,x i:hj ) < e||x i; i ; y|| < -jJ—=\\xi\\. (4.9) 

(The first inequality comes from the fact that Xj+i has to lie in the halfspaces 
constructed by the previous projection. If Kj is a manifold, then the first inequality 
is in fact an equation. The last inequality is from the highlighted claim above.) 
Combining (14.811 and (14.91) gives ||x*_f-i || < ||xj||, which is what we need. 

In the case where Kj has the SOSH property near x, (14.91) can be changed to 
give (v°,Xi + i) < y^j^ll^ill 2 for some constant M, which gives ||xj+i|| < ll^ill 2 - 
This completes the proof. □ 

5. An algorithm with arbitrary fast linear convergence 

In this section, we show the arbitrary fast linear convergence of Algorithm 15.11 
for the nonconvex SIP when the sets are super-regular. Motivated by the fast con¬ 
vergent algorithm in )Panl5b| . Algorithm 15.11 collects old halfspaces from previous 
projections to try to accelerate the convergence in later iterations. 

We now present an algorithm that can achieve arbitrarily fast linear convergence. 
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Algorithm 5.1. (Local super-regular SHQP) Let Ki be (not necessarily convex) 
closed sets in R” for l £ {1,..., m}. From a starting point xo £ R™, this algorithm 
finds a point in the intersection K := 

Step 0: Set i = 1, and let p be some positive integer. 

Step 1 : Choose ji £ argmax :) {d(a;i_i, Kj)}- (i.e., we take only an index which 
give the largest distance.) 

' £ R”, £ 1" and b\ Jil £ R by 


Step 2: Choose some t; £ [0,1). Define ^ 


Jji) 

l i 

Ah) 


and b 


Ui 


£ P Kj (Xi- i), 

= x^-xM, 

An) Jji) 


= {a\ Ji, ,x\ Ji> ) +Ti{4 Ji) ,Xi^ i - a# 0 ) 

= ( a i 3i \ (! - + TiXi-x). 

Let Xi = Pp.(xi— i), where the set Fi C R ra is defined by 

Fi := {x | (a^ l \x) < b\^ for max(l ,i — p) <l<i}- 
Step 3: Set i t— i + 1, and go back to step 1. 


(5.1a) 

(5.1b) 

(5.1c) 


(5.2) 


There are some differences between Algorithm 15.11 and that of IPanl5bl Algo¬ 
rithm 5.1]. Firstly, in step 1, we take only one index j in {1,..., m} that gives the 
largest distance d(xi-i,Kj). Secondly, the term Ti(a^'\xi-i — x^) is added in 
(I5.1cl) to account for the nonconvexity of the set Kj.. 

The parameter n in Algorithm 15.11 requires tuning to achieve fast convergence. 
This tuning may not be easy to perform. 

Lemma 5.2. (Convergence of Alaorithm \5.I\) Suppose that in Algorithm [577] the 
sets Ki are all super-regular at a point x* £ K = r\ff 1 Ki for all l £ {1,... ,m}, 
and the local metric inequality holds, i.e., there is a j.3 > 0 and a neighborhood V) 
of x* such that 


d(x,nff 1 Ki) < fi max d(x,Ki) for all x £ V\. 


(5.3) 


1 <I<7i 

Then for any r £ (0,1), we can find a neighborhood U of x* such that 

• For any xq £ U, Algorithm 15.11 with Ti = r for all i generates a sequence 


{xi} that converges to some x £ Vi so that 

||x,+i — x|| < ||xj — ,t|| for all i > 0, 


and \\xi — xll < L max d(xi,I\i), 


where 


p := ALL ~ j 1 ~ T ) 2 and L := * 


(3 


1 ~ P 


(5.4) 

(5.5) 


(5.6) 


Proof. By the super-regularity of the sets Ki , for any 6 > 0, there exists a neigh¬ 
borhood V 2 of x* such that for any l £ {1 ,..., m}, we have 

(z - y,v) < S\\z - y|||H| for all z,y £ K t n V 2 ,v £ N Kl (y). 

We choose 6 > 0 to be small enough so that <5 < 


(5.7) 
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Claim: If Xi-i are such that B(xi_i, jf^d(xi-i, K)) C V\ fl V 2 , then K (~l 
K)) C Hi, where the halfspace Hi := {x : {af'\x) < b\^} is 

defined by (15.11) . 

Proof of Claim: Suppose x' G K fl B(xj_i, rA—d{xi-\,K)). Since K fl V 2 , we 
have 

(xi -1 - x\ j '\x' - x { f t) ) < 8\\xi-i - xp^Hllx' - x^||, 
where xf^ is the point in Pkj. (xj_ 1 ) C I\j i in (EH). Also, x' was assumed to lie 
in B(xi_i, Note that ||xj_i — xf '^|| < d(xi-i,K). So we have 

\\x' - x^\\ < \\x' - x^ i|| + ||xi_i - x^|| < d(xi-i,K). 


Note that —h 1 < -j-—. From the above inequality, we have 


(xi~i - x[ Ji \x' - x\ n> ) < 8\\xi-i - 


~P 

S3i) 


J3i) " <—d(x i - 1 ,K) 2 . 


I II — 


1 ~ P 


Recall that S < r ■ Local metric inequality gives ||xj_i — x- J *' ) || > ^ d(xi,K ), 
so 

(xi -1 - x\ Ji \x' - x { d i] ) < jpd(xi-i,K) 2 < r||xi_i - x[ Ji) \\ 2 . 

The above inequality is precisely {a^‘\x') < b^ l \ so x' G Hi. This ends the proof 
of the claim. 

Suppose B(xo, j^;d(xo, K)) C Vi fl V 2 . If the conditions of Lemma 1531 are 
satisfied, then we have convergence to some x. 

We try to prove that d(xi+i,K) < pd(xi,K). Recall that Xj+i = Pp {xi). 
By making use of the claim above, the previous halfspaces generated all contain 
K nB(xj, j^-d{xi, K)), so F i+ i is a polyhedron that contains KPM(xi, jf-^d(xi, K)). 
It is clear that K flB(xj, rA—d{xi,K)) ^ 0, so Pi+i is nonempty. It is obvious that 

d(xi, Fi+i) < d(xi, K ), so || Xi — Xj+i || < (1 — r)d(xi , K ). The distance d(xi, Lfj+i) 
is at least ^ d(xi,K ), so ||xj — Xi+i|| > jjd{xi,K). We then have 


d(x i+1 ,K) 2 < d(xi,K) 2 

< d(xi, K) 2 


|| Xi Xj-j-i || 

^-pp-d{xi, K) 2 


VP 2 - (1 - r) 2 
P 


d{xi,K) 2 . 


We can now apply Lemma 1531 The conclusion (15.41) comes from the fact that {xi}, 
by construction, is obtained by projection onto convex sets that contain x and the 
theory of Fejer monotonicity. The conclusion (15.51) is straightforward from Lemma 
I3.5l al and local metric inequality. □ 


We now prove the theorem on the arbitrary fast multiple-term linear convergence 
of Algorithm 15. II 

Theorem 5.3. (Arbitrary fast linear convergence) Consider the setting of Theorem 
15.21 If p in Alaorithm \5.1\ is finite and sufficiently large, then for all t G (0,0.5) 
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(independent of p) we can find a neighborhood U of x* such that if xq £ U, then 
the iterates of Alaorithm \5.1\ with Ti = r converge to some x £ K. Moreover, 

limsup -.. i+ ^ —rip- < 8 Lt, (5-8) 

i->oo ||Xi - z|| 

where L = and p = ^^ ^ 1 ^ . 

Proof. The basic strategy is to prove the inequalities (15.91) and (15.11)1) like in |Panl5bt 
Theorem 5.12], with a bit more attention put into handling the nonconvexity. 

By Lemma T5.21 the convergence of the iterates {xi} to some x £ K is assured. 

Without loss of generality, suppose that x = 0. Let u* := —-— (t 1 } , where x^fff 1 

\\ x i— x i + l II 

is defined through (15.lal) . 

The sphere 5 " -1 := {w £ R” : ||w;|| = 1} is compact. Suppose p is such that 
we can cover S ’" -1 with p balls of radius By the pigeonhole principle, we can 
find j and k such that i<j<k<i+p and v* and v(. belong to the same ball of 
radius -p covering S n ~ l . We thus have \\v?j — uj£|| < (The key in choosing p is 
to obtain the last inequality.) 

We shall prove that if i is large enough, we have the two inequalities 

{Vj,x k ) < 2T\\xj\\ (5.9) 

and -plkfell < {Vj,Xk}- (5.10) 

4L 

In view of the Fejer monotonicity condition (15.41) . these two inequalities give ||a;i+p|| < 
||xfc|| < 8 Lt||.Tj|| < 8 ^) 7 -||rCi||, which gives the conclusion we seek. 

We first prove & Since x k lies in F k , it lies in the halfspace with normal v* 

passing through (1 — t)x ^ + TXj. (Recall that x^f _\^ was defined in (15. lal) . and 
lies in Pic-j. ( Xj ).) This gives us 

(v *, x k ) < (v *, (1 - r)x < fff l) + TXj) (5.11) 

= (1 - t)(v* - v,x ( .%f l) ) + (1 - t){v, x < fff l) ) + t{v* ,Xj), 

where v is some vector with norm 1 in IVjq, (a))- Since lim^oo Xi = x, we can 
assume that {xi} is sufficiently close to x so that: 

( 1 ) the vector v, by the outer semicontinuity of the normal cone mapping x i—>• 
Nkj. +i (x), can be chosen to be such that ||v* — u|| < J, and 

(2) by the super-regularity of Kj at x, we have (v, f 11 ^+ 1^11 • 

Note that (1 — r)^^ 1 ^ + TXj is the projection of Xj onto one of the halfspaces 
defining Fj+i and that r < \. From the principle in Proposition 12.01 we have 

|| < ||ajj||. Since ||i;*|| = 1, we have (v*,Xj) < 1111. Continuing the arith¬ 
metic in (15.111) . we have 

(v *, x k ) < (1 - t)(v* - v, x < fff l) ) + (1 - t){v, x^l\ l] ) + t(v* , Xj) 

< (i-^G+Dn^ii+^ii 

< y Ill'll + x\\xj\\ < 2t||^||. 
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This ends the proof of (15.91) . Next, we prove (15.101) . Recall that x € PK- jk+i (x k ) 
was defined in (I5.1al) . Note that provided r < the p = ^ ~~^ 1 — in (15.61) is 
less than p = ^^ ^ 1 ^ 4 . Hence the L = in (15.61) is less than L = . By using 

Xk—x ^ k+1 ' 1 

the definition of v%. = - ft 1 i} and (15.51) . we have 

Wxk-x ^ 1 II 

(v%,x k -4+i l} ) = ll*fc-*te l} |l = d ( x k’ K j k+1 ) > j\\xk\\ > j\\x k \\- (5.12) 

By the super-regularity of K] k+1 at x and the fact that x = lim^ooX,, we can 
assume that Xk is close enough to x so that 

K,o-xgr } ) < i4+r } n < ^im- (5-i3) 

(Note that the inequality on the right follows from the same proof of llx^^H < 
||xj||.) Combining (15.121) and (15.131) as well as \\v* — u^|| < gives us 


(Vj,x k ) = (vl,x k -x^lX^) + (vl,x\!lX l) ) + (v*-v* k ,x k ) 

> j\\x k \\ - ^IMI - ^ IMI = 

This ends the proof of (I5.1UI) . which concludes the proof of our result. 


□ 


The large parameter p is an upper bound on when we can find v* and v £ such 
that ||u* — vjf|| < We hope that the upper bound needed in a practical imple¬ 
mentation would be much smaller than p. 


Remark 5.4. (Towards superlinear convergence) The coefficient of 8 in (15.81) can 
be reduced, but this does not detract us from the point that as r \ 0 , the right 
hand side of (15751) goes to zero. So there is a choice of parameters {'Tj}g ;1 that 
can be chosen at each iteration of Algorithm 15.11 so that superlinear convergence 
is achieved, even though there doesn’t seem to be a good way of choosing how 
the parameters r go to zero. If the parameter r goes to zero too fast, the Fejer 
monotonicity (15711) of the iterates may not be maintained, which may mean that 
Lemma 15771 may not hold, i.e., the iterates {xi}^Z 1 may not converge. Contrast this 
to the convex SIP in |Panl5bj , where setting r = 0 gives multiple-term superlinear 
convergence 


l- ll^+p x ll 

!->oo 11*4- *|| 


= 0 


instead of multiple-term arbitrary linear convergence (15.81) . In view of nonconvexity, 
the observation in Remark [373] has to be overcome, so we believe that this arbitrary 
fast convergence is difficult to improve on in general. 


Remark 5.5. (Simplification in (15.121) 1 The inequality d(x k , Kj k+1 ) > yj||xfc|| in 
(15.121) follows easily from (15.51) . But in [Panl5bj , some effort was spent to prove the 
inequality limsup^^oo u^-||-(i(x/ c , A'j fc+i ) > j. The proof of the multiple-term super- 
linear convergent algorithm for convex problems in I Pa n 15bl can thus be shortened 
considerably. 
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If some of the sets Ki are known to be convex sets or affine subspaces, then this 
information can be taken into account by setting the appropriate t, to zero when 
creating the halfspaces defined by (EH). 

6. Two step SHQP 

The algorithms in this paper need not guarantee that {d(xi, K)} is nonincreasing. 
In this section, we give an example of additional conditions needed for the SHQP 
to have this property. Consider the following algorithm. 

Algorithm 6.1. (2-SHQP) Let K\, K 2 be two closed sets in R”, and I\ = A'inA' 2 . 
This algorithm tries to find a point x £ K using a starting iterate Xq . 

01 Set i = 0 
02 Loop 

03 Set Xi+i to be an element in PrAxi ) an d * <— i + 1. 

Of Set Xi -|_i to be an element in Pr 2 (xA and i 4 — i + 1. 

05 If Zxi_ 2 Xi-iXi < 7 t/ 2 , then 

06 Set Xi+i P{x'.{x—Xi—i,Xi— 2 —Xi— i)<0,(x— Xi,Xi—i— a:i)<0}* 

07 else 

08 set Xi+i = Xi 
09 end if 

10 i <r- i + 1 

11 end loop 

In line 6 , x*+i is the projection of Xi onto the polyhedron formed by intersecting 
the last two halfspaces generated by the projection process. See Figure iGTl for an 
illustration of the first few iterates xi, X 2 and x% formed by a single iteration of 
the loop. If the “if” block in lines 5 to 9 is removed, then the algorithm reduces to 
an alternating projection algorithm. We now analyze the effectiveness of this “if” 
block. 

Proposition 6 . 2 . (2-SHQP) Consider Algorithm: 1 6 . 11 Let S £ (0,1). Let x* £ K 
and let a neighborhood V of x* be such that 

(1) (v, y — z) < <5||i>|| ||y — z\\ for all y, z £ KiC\V, Z £ {1, 2} and v £ Nri (z),and 

(2) d(x, K ) < (3 max^g^!^} d(x, Kf) for all x £ V. 

Letxi, X 2 be successive iterates of Alaorithm \6.1\ Suppose B(a:i, (/3 + l)||a:i—X 2 II) C 
V. Let 0 := ZX 2 X 1 X 0 < ir/2. If 

(5[/3cos0 + (/3 + 1)] < i cos0, (6.1) 

then d(x 3 ,I\) < d(x 2 ,K)- 

Conditions (1) and (2) are consequences of the super-regularity condition and 
local metric inequality condition respectively, so they will be satisfied when close 
to K. 

Proof. Since property (2) and the fact that Xi £ K\ implies that 

d{x 2 ,K)<P max d(x 2 , Ki) = f3d(x 2 , KQ < f3\\xi - x 2 \\, 
ie{i, 2 } 

the set B(t 2 ,/ 3 ||x 2 — aq||) D K is not empty. Hence 

0 A ^{x 2 1 fd\\x 2 - ei ||) C B(xi, (/? + 1)||cc 2 - a;i||) C V. 


( 6 . 2 ) 
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Let y be any point in B(x 2 ,/3||x 2 — Xi||) D K. By property (1), we have 

(y - x 2 ,xi - x 2 ) < 6\\y - x 2 ||||xi - x 2 || < /3S\\xi - x 2 || 2 . (6.3) 

In other words, B(x 2 , /3||x 2 — Xi||) D K C H 2 , where H 2 is the halfspace defined by 
H 2 := {x : (x — x 2 ,xi - x 2 ) < !35\\xi — x 2 || 2 }. 

Next, from m , we can make use of the argument similar to ( ](>.. (I) to prove that 
B(xi, {/3 + 1)|| x 2 - xi||) n K c Hi, 
where H\ is the halfspace defined by 

Hi := {x : (x - xi,x 0 - Xi) < {/3 + l)<5||x 0 - Xi|| 2 }. 

This implies that 

0 ^ B(x 2 ,/3||x 2 - xi||) n A' c Hi<1H 2 . (6.4) 



Figure 6.1. This figure illustrates the proof of Proposition 16.21 
The dotted lines show the boundaries of Hi and H 2 . 

We refer to Figure HTT1 which shows the two dimensional cross section containing 
xo, xi and x 2 . The point X3 is also shown in the figure, and is the projection of x 2 
onto Hi fl H 2 . We now calculate the minimal value of (, x), where x ranges 
over Hi fl H 2 . This minimal value can be seen to be di — d 2 — e?3, where di, d 2 and 
c ?3 are the distances as indicated in Figure l6Tl These distances can be calculated 
to be 

di = ||x 2 - x 3 || = ||xi-x 2 ||cot 0 , 

d 2 = fd5\\x\ — X 2 1| cot 9, 
and ^3 = {fi + l)(5||xi — x 2 ||/sin 0 . 

We can check that (16.11) is equivalent to d 2 + d% < \d\. As long as (16.11) holds, the 
region Hi(lH 2 lies on the same side as X3 of the perpendicular bisector of the points 
x 2 and X3. Hence all the points in HiHH 2 are closer to X3 than to x 2 . Since HiC\H 2 
contains all the points in Pk{x 2 ) by (16.41) . we thus have d{x^,K) < d(x 2 ,K) as 
needed. □ 

Note that if 9 < 7t/ 2 is too close to 7 r/ 2 , then the condition (16.111 can fail. In fact, 
if 6 > cos -1 S, one can check that condition (1) in Proposition 16.21 does not rule 
out x 2 being inside K\ , so there would be no point calculating X3 . The supporting 
halfspaces as calculated by the projection process can be too aggressive for super¬ 
regular sets. For example, one can draw a manifold in R 2 such that the intersection 
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the manifold and a halfspace generated by the projection process consists of only 
one point. Two halfspaces of this kind would give an empty intersection with the 
manifold. Therefore, one has to relax the halfspaces. 

We remark that the procedure in (1G.3D shows how to construct halfspaces under 
the super-regularity condition, and can be augmented into Algorithm 15.11 as long 
as we have a good estimate for S. 

7. Global strategies 

In this section, we discuss methods for when local methods of the nonconvex SIP 
are not appropriate. In Example 17. 11 we show that while the theory for the convex 
SIP suggests that one should not backtrack, backtracking is however suggested 
for the nonconvex problem, which can lead to the Maratos effect and slows down 
convergence. 

The problem of finding a point in the intersection of a finite number of closed 
sets Ki C K n , where l = 1,..., m, can be equivalently cast as the problem of finding 
a point that minimizes fix), where f{x) can be chosen as 


dfan^Kt), 

(7.1a) 

jrd{x,Ki) 2 , 

1=1 

(7.1b) 

max {d(x,I\i)}, 

ZE{l,...,ra} 

(7.1c) 


or some other function similar to those presented above. In the event that the 
intersection njTj is nonempty, then any point in I\ := n[T-, Ki would be a global 
minimizer of /(•). The function in (17.1al) is the function of choice, but n^A) 
can be only be estimated well locally with the techniques in Section [3] Instead of 
trying to minimize /(•), the problem that really needs to be solved is the one of 
finding an x in {x : f(x) < 0}. This is a simpler problem which can be solved by 
a subgradient projection method that is somewhat simpler than the minimization 
problem. A bundle method jHUL93! IBGLS06] adapted for a nonconvex objective 
function can be used to solve the nonconvex SIP. (See also IlfWWXTfl IPanl4j for 
the principles of a finitely convergent algorithm for this setting. This idea of finite 
convergence goes back to [ PM79 , MPH81 . Fuk82l IPI88] for the convex case and the 
smooth case.) 

A standard procedure in optimization algorithms is the line search procedure. A 
search direction is calculated, and the next solution is obtained by a line search along 
this search direction. For the nonconvex SIP, the search direction can be calculated 
by projecting onto a polyhedron formed by intersecting a number of previously 
generated halfspaces. There are two ways we can backtrack to obtain decrease in 
some objective function (in EH) or otherwise). Firstly, one can remove halfspaces 
that describe the polyhedron. It is sensible to remove the older halfspaces since they 
become less reliable. This has the effect of reducing the distance from the current 
iterate to the polyhedron, so the search direction is more likely to give decrease. 
The problem of projecting onto the polyhedron with one halfspace removed can be 
solved effectively from the old solution using a warmstart quadratic programming 
algorithm (for example, the active set method of (Gol86j l. Secondly, one can use 
the usual backtracking line search. 
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We note however that in the pursuit of obtaining decrease in the objective func¬ 
tion, we may encounter the Maratos effect (see |NW061 Section 15.5], who in turn 
cited [ [Mar78] ) which slows convergence. 

Example 7.1. (Backtracking slows convergence) In this example, we show how 
the SHQP strategy for a convex SIP converges quickly for a problem, but would be 
slowed down by backtracking when treated as a nonconvex SIP. Consider the sets 
Ki , K 2 C R 3 where K\ = Hi and K 2 = H 2 (~l H 3 , where the halfspaces H 3l H 2 and 
H 3 are defined by 

Hi := {zeR 3 : (0,1,0)® < 0}, 

H 2 := {a; € R 3 : (1/3, -1, 0)s < -2}, 

and H 3 := {x £ R 3 : (— 1, — 1, l)x < 0}. 

Let the point a"o be (0,1,0). The projection of xq onto I\i and K 2 generates 
the halfspaces Hi and H 2 respectively. The projection of x 3 onto Hi D H 2 is 
Xi := (—6,0,0). We can calculate that 

d(xo,Ki) = 1, d(xo,K 2 ) = 3/VIo, d(xi,Ki) = 0 and d(x\,K 2 ) = 2x/3. (7.2) 

The projection of Xi onto K 2 generates H 3 , and once we project xi onto Hi C\H 2 fl 
H 3 , we found a point in Ki fl I\ 2 . If this SIP were solved as a nonconvex SIP, the 
values in (17.21) fitted into the objective function (17.11)1) or (17. lcl) suggests that one 
has to backtrack in some manner, and this actually slows down the convergence. 
(See Figure mi for an illustration.) 



Figure 7.1. This figure illustrates the two dimensional cross sec¬ 
tion in {x £ R 3 : x 3 = 0} in the example in Example 17.11 Note 
that the projection of x\ onto K 2 lies outside this cross section. 

We recall the method of averaged projections for finding a point in A';, where 
Ki C R” for alii e {1,..., m}, is defined by 

^ m 

Xi+1 = — y^PK^Xi). (7.3) 

It was noticed that this formula corresponds to the method of alternating projec¬ 
tions between the two sets in R” m defined by 

D := {(a;, x ,..., x) : x £ R”} 
and K := Ki x K 2 x ■ ■ • x K m . 
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It is easy to see that f(xi+ 1 ) < f(xi) if Xi+ 1 is defined by (17.31) and /(•) is defined by 
(I7.1bl) since \Jf(x) is the distance of (x, ■ ■ ■ , x) £ D to K. Moreover, if f(xi + ±) = 
then Xi is the minimizer. 

In the SHQP strategy for nonconvex problems, we can use backtracking to find 
the next iterate Xi of the form tPp.(xi-\) + (1 — t)xi- 1 , where t £ (0,1] and Fj 
is the polyhedron defined by intersecting previously generated halfspaces like in 
Algorithm 15.11 We can instead find an iterate of the form 

^ m 

tPp.(Xi-i) + (1 - t)—NP Kl (Xi-i). 

* m 

i=i 

Other heuristics for the nonconvex problem are also possible. For example, if one 
is certain that the intersection is nonempty, then one can try to avoid points in the 
balls M(xi, d(xi, Ki )) for all i > 0 and l £ {1,..., m}. If some of the sets are spectral 
sets (i.e., the set of symmetric matrices solely described by their eigenvalues), then 
the results in ]LM08] can also be applied. 

8. Conclusion 

We hope our results make the case that in solving feasibility problems involving 
super-regular sets, one should use the SHQP procedure as much as possible to 
accelerate convergence once close enough to the intersection. The size of the QPs 
to be solved can be kept to be of a manageable size if we combine with projection 
methods like in Algorithm 13. II 
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